38 research outputs found

    hapassoc: Software for Likelihood Inference of Trait Associations with SNP Haplotypes and Other Attributes

    Get PDF
    Complex medical disorders, such as heart disease and diabetes, are thought to involve a number of genes which act in conjunction with lifestyle and environmental factors to increase disease susceptibility. Associations between complex traits and single nucleotide polymorphisms (SNPs) in candidate genomic regions can provide a useful tool for identifying genetic risk factors. However, analysis of trait associations with single SNPs ignores the potential for extra information from haplotypes, combinations of variants at multiple SNPs along a chromosome inherited from a parent. When haplotype-trait associations are of interest and haplotypes of individuals can be determined, generalized linear models (GLMs) may be used to investigate haplotype associations while adjusting for the effects of non-genetic cofactors or attributes. Unfortunately, haplotypes cannot always be determined cost-effectively when data is collected on unrelated subjects. Uncertain haplotypes may be inferred on the basis of data from single SNPs. However, subsequent analyses of risk factors must account for the resulting uncertainty in haplotype assignment in order to avoid potential errors in interpretation. To account for such uncertainty, we have developed hapassoc, software for R implementing a likelihood approach to inference of haplotype and non-genetic effects in GLMs of trait associations. We provide a description of the underlying statistical method and illustrate the use of hapassoc with examples that highlight the flexibility to specify dominant and recessive effects of genetic risk factors, a feature not shared by other software that restricts users to additive effects only. Additionally, hapassoc can accommodate missing SNP genotypes for limited numbers of subjects.

    elrm: Software Implementing Exact-Like Inference for Logistic Regression Models

    Get PDF
    Exact inference is based on the conditional distribution of the sufficient statistics for the parameters of interest given the observed values for the remaining sufficient statistics. Exact inference for logistic regression can be problematic when data sets are large and the support of the conditional distribution cannot be represented in memory. Additionally, these methods are not widely implemented except in commercial software packages such as LogXact and SAS. Therefore, we have developed elrm, software for R implementing (approximate) exact inference for binomial regression models from large data sets. We provide a description of the underlying statistical methods and illustrate the use of elrm with examples. We also evaluate elrm by comparing results with those obtained using other methods.

    A Comparison of Five Methods for Selecting Tagging Single-Nucleotide Polymorphisms

    Get PDF
    Our goal was to compare methods for tagging single-nucleotide polymorphisms (tagSNPs) withrespect to the power to detect disease association under differing haplotype-disease associationmodels. We were also interested in the effect that SNP selection samples, consisting of eithercases, controls, or a mixture, would have on power. We investigated five previously describedalgorithms for choosing tagSNPS: two that picked SNPs based on haplotype structure (Chapmanhaplotypicand Stram), two that picked SNPs based on pair-wise allelic association (Chapman-allelicand Cousin), and one control method that chose equally spaced SNPs (Zhai). In two diseaseassociatedregions from the Genetic Analysis Workshop 14 simulated data, we tested theassociation between tagSNP genotype and disease over the tagSNP sets chosen by each methodfor each sampling scheme. This was repeated for 100 replicates to estimate power. The two allelicmethods chose essentially all SNPs in the region and had nearly optimal power. The two haplotypicmethods chose about half as many SNPs. The haplotypic methods had poor performance comparedto the allelic methods in both regions. We expected an improvement in power when the selectionsample contained cases; however, there was only moderate variation in power between thesampling approaches for each method. Finally, when compared to the haplotypic methods, thereference method performed as well or worse in the region with ancestral disease haplotypestructure

    elrm: Software Implementing Exact-Like Inference for Logistic Regression Models

    Get PDF
    Exact inference is based on the conditional distribution of the sufficient statistics for the parameters of interest given the observed values for the remaining sufficient statistics. Exact inference for logistic regression can be problematic when data sets are large and the support of the conditional distribution cannot be represented in memory. Additionally, these methods are not widely implemented except in commercial software packages such as LogXact and SAS. Therefore, we have developed elrm, software for R implementing (approximate) exact inference for binomial regression models from large data sets. We provide a description of the underlying statistical methods and illustrate the use of elrm with examples. We also evaluate elrm by comparing results with those obtained using other methods

    Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling

    Get PDF
    Outcome-dependent, two-phase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by Lawless, Kalbfleisch, and Wild (1999) under two-phase sampling designs. We show that the maximum likelihood estimators for both the parametric and nonparametric parts of the model are asymptotically normal and efficient. The efficient influence function for the parametric part aggress with the more general information bound calculations of Robins, Hsieh, and Newey (1995). By verifying the conditions of Murphy and Van der Vaart (2000) for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood

    CrypticIBDcheck: An R Package For Checking Cryptic Relatedness In Nominally Unrelated Individuals

    Get PDF
    Background In population association studies, standard methods of statistical inference assume that study subjects are independent samples. In genetic association studies, it is therefore of interest to diagnose undocumented close relationships in nominally unrelated study samples. Results We describe the R package CrypticIBDcheck to identify pairs of closely-related subjects based on genetic marker data from single-nucleotide polymorphisms (SNPs). The package is able to accommodate SNPs in linkage disequibrium (LD), without the need to thin the markers so that they are approximately independent in the population. Sample pairs are identified by superposing their estimated identity-by-descent (IBD) coefficients on plots of IBD coefficients for pairs of simulated subjects from one of several common close relationships. Conclusions The methods implemented in CrypticIBDcheck are particularly relevant to candidate-gene association studies, in which dependent SNPs cluster in a relatively small number of genes spread throughout the genome. The accommodation of LD allows the use of all available genetic data, a desirable property when working with a modest number of dependent SNPs within candidate genes. CrypticIBDcheck is available from the Comprehensive R Archive Network (CRAN)

    Age-dependent variation of genotypes in MHC II transactivator gene (CIITA) in controls and association to type 1 diabetes

    Get PDF
    The major histocompatibility complex class II transactivator (CIITA) gene (16p13) has been reported to associate with susceptibility to multiple sclerosis, rheumatoid arthritis and myocardial infarction, recently also to celiac disease at genome-wide level. However, attempts to replicate association have been inconclusive. Previously, we have observed linkage to the CIITA region in Scandinavian type 1 diabetes (T1D) families. Here we analyze five Swedish T1D cohorts and a combined control material from previous studies of CIITA. We investigate how the genotype distribution within the CIITA gene varies depending on age, and the association to T1D. Unexpectedly, we find a significant difference in the genotype distribution for markers in CIITA (rs11074932, P=4 × 10(-5) and rs3087456, P=0.05) with respect to age, in the collected control material. This observation is replicated in an independent cohort material of about 2000 individuals (P=0.006, P=0.007). We also detect association to T1D for both markers, rs11074932 (P=0.004) and rs3087456 (P=0.001), after adjusting for age at sampling. The association remains independent of the adjacent T1D risk gene CLEC16A. Our results indicate an age-dependent variation in CIITA allele frequencies, a finding of relevance for the contrasting outcomes of previously published association studies.Juvenile Diabetes Research Foundation International (2-2000-570) and (1-2001-873The Swedish Research CouncilSvenska Diabetes FondenBarndiabetes FondenNovo Nordisk FoundationMagnus Bergvalls FoundationNeuropromise (LSHM-CT-2005-018637)NIH grant DK-17047Swedish Brain Power initiativeGun and Bertil Stohne’s Foundation,Foundation for Old ServantsAlzheimer FoundationLIONS Foundation for Research of Age Related DisordersAFA foundationSöderberg FoundationKnut and Alice Wallenbergs FoundationManuscrip
    corecore